Developing High-resolution Universal Multy-type n-gram Text Similarity Detector

نویسندگان

Yurii Palkovskii

Alexei Belov

چکیده

This paper describes approaches used for the Plagiarism Detection task during PAN 2014 International Competition on Uncovering Plagiarism, Authorship, and Social Software Misuse, that scored 1-st place with plagdet score (0.907) for test corpus no.3 and 3-rd place score (0.868) for test corpus no. 2. In this work we aggregated all the previously researched experience from PAN12 and PAN 13 research works [2] and thus further improved previously developed methods of detecting plagiarism [8], with the help of: contextual ngrams, surrounding context n-grams, named entity based n-grams, odd-even skip n-grams, functional words frame based n-grams, TF-IDF sentence level similarity index and noise sensitive clusterization algorithm, focused summary type detection heuristics, combined into a single model to mark similarity sections and thus effectively detect different types of obfuscation techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A performance study of the conceptual implementation of the GEM-tracking detector in Monte Carlo simulation

PANDA experiment (antiProton ANnihilation at DArmstadt) is one of the key projects of the future FAIR facilities to investigate the reactions of antiprotons with protons and nuclear targets. experiment is designed to serve as a completely extraordinary physical potential due to exploiting the availability of cold and high-intensity beams of antiprotons. One of the significant parts of the ...

متن کامل

Optimization of an ultra-high-resolution rectangular pixelated parallel-hole collimator with a CZT pixelated semiconductor detector for HiRe-SPECT system

Introduction: In nuclear medicine, the use of a pixelated semiconductor detector such as CZT is an of growing interest for introducing new devices. Especially, the spatial resolution can be improved by using a pixelated parallel-hole collimator with equal holes and pixel sizes based on the pixelated detector. The purpose of this study was to compare the effect of pixelated and ...

متن کامل

An Unsupervised Text Normalization Architecture for Turkish Language

A variety of applications on the problem of short-text messages require text normalization process that transforms ill-formed words into standard ones. Recently, many successful approaches have been applied to text normalization especially for social media text. Since each natural language has its own difficulties and barriers, we need to design an architecture to normalize short text messages ...

متن کامل

Study on Automatic Scoring of Descriptive Type Tests using Text Similarity Calculations

In this paper, we evaluate the automatic scoring of a descriptive type test. In the experiments, three test similarity measures are compared in terms of automatic scoring quality. Two of them are BLEU and RIBES, which are n-gram and word-level matching processes respectively, originally used for automatic evaluation of machine translation output. The other similarity process is Doc2Vec, which u...

متن کامل

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

This paper presents the results of classifying Arabic text documents using the N-gram frequency statistics technique employing a dissimilarity measure called the “Manhattan distance”, and Dice’s measure of similarity. The Dice measure was used for comparison purposes. Results show that N-gram text classification using the Dice measure outperforms classification using the Manhattan measure.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Developing High-resolution Universal Multy-type n-gram Text Similarity Detector

نویسندگان

چکیده

منابع مشابه

A performance study of the conceptual implementation of the GEM-tracking detector in Monte Carlo simulation

Optimization of an ultra-high-resolution rectangular pixelated parallel-hole collimator with a CZT pixelated semiconductor detector for HiRe-SPECT system

An Unsupervised Text Normalization Architecture for Turkish Language

Study on Automatic Scoring of Descriptive Type Tests using Text Similarity Calculations

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

عنوان ژورنال:

اشتراک گذاری